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Pitch is a complex hearing phenomenon that results from elicited and self-generated cochlear vibrations. 
Read-off vibrational information is relayed higher up the auditory pathway, where it is then condensed into 
pitch sensation. How this can adequately be described in terms of physics has largely remained an open 
question. We have developed a peripheral hearing system (in hardware and software) that reproduces with 
great accuracy all salient pitch features known from biophysical and psychoacoustic experiments. At the 
level of the auditory nerve, the system exploits stochastic resonance to achieve this performance, which may 
explain the large amount of noise observed in the working auditory nerve. 

Among all human sensors, the hearing system has withstood an accurate physical description the longest. 
Recent progress has revealed that hearing phenomena previously believed to be located in the CNS are the 
consequences of the nonlinear physics properties of the cochlea 1 . Here, in continuation of this work, we 
describe what physics principles are used to generate the biophysical and psychoacoustic hearing information 
along the hearing pathway up to the auditory nerve. 

From a physics point of view, the transduction of external sound towards the CNS involves three components: 
The hearing sensor (cochlea), the attached inner hair cells (IHC), and the auditory nerve neurons (ANN) (Fig. 1). 
In the following, we will present exclusively data from our software implementation of the compound device (for 
consistency), though our hardware implementation yields essentially indistinguishable results. Our Hopf coch- 
lea 1 ' 1, serves as the hearing sensor. The auditory input signal first passes a Hilbert transform to obtain the 
dimensionality required to drive Hopf systems that act as nonlinear amplifiers. The Hopf cochlea faithfully 
reflects mammalian sound processing (and beyond 4 ): Strong enhancement of weak and compression of strong 
input signals, by large gain active nonlinear input amplification. Phenomena emerging from this nonlinear 
behavior, like combination tone and two-tone suppression laws, provide important tests for corroborating the 
validity of the approach. Our Hopf cochlea has an intrinsic mesoscopic design: The frequency axis is discretized 
into a set of sections, each section modeling the nonlinear amplification process along a region of the basilar 
membrane. The discretization is flexible; here, one section covers approximately a quarter octave. Each section is 
endowed with properties of the passive hydrodynamic behavior and an active Hopf amplifier. The active part 
implements the Hopf normal form 5 



z= (fi+j)a> c z — oj c \z\ 2 z — co c F(r), zeC- 



(1) 



Here, the vectors of the input F(f) and output z are complex variables (j is the imaginary unit), and/ c = (oJ2n is 
the characteristic frequency of the section. ^ is the tunable parameter that defines each section's distance from the 
Hopf bifurcation point at = 0. Each section is composed of a Hopf amplifier followed by a section- specific 6th- 
order Butterworth (low pass) filter modeling the viscous fluid losses. For the results presented below, we use the 
parameters as in Ref. 1 ; we will display the responses of the frequency channels/,; = 1 760 Hz and/ c = 440 Hz. The 
responses of this cochlea are in perfect agreement with biophysical measurements, for both amplitude and phase 
of the propagating signal (Supplementary Information of Ref. 1). 

We have complemented this cochlea by inner hair cells IHC, where the cochlear membrane state V Co (t) is 
linearly relayed to displacements u of the IHC cilia according to u(t) = 20 ■ 10~' J • V Co (t), which affect the IHC 
voltage Vjc according to the standard IHC model 6 (for the equations see the Inner hair cell (IHC) section of the 
Methods or the original article; we use the model's standard in vivo parameters). The dynamical role of IHC is to 
half-wave rectify and slightly compress the signal: on top of a frequency-dependent DC component, the output 
has now a slightly low-pass filtered AC component 6 . 
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Figure 1 | Peripheral auditory signal pathway of an AM sound with f car = 850 Hz and f mod = 200 Hz. Responses evoked at a place corresponding to f c = 
880 Hz. Top row: Physical signal, bottom row: Fourier spectrum representation. Stages: Cochlear BM motion, inner hair cells (V/c) (both continuous 
signals), ANN spike trains (of two characteristic classes, see below). 



The IHC signal feeds into the ANN. Biological ANN show widely 
divergent response properties. At first view, their extreme noisiness 
seems to work against their ability to convey precise hearing that 
crucially depends on precise timing and frequency. Our study will, 
however, reveal that the opposite is true and that there is a beneficial 
effect of noise. Biological ANN fall into two main classes 7 9 (cf. the 
Classes of auditory nerves (ANN) section of the methods): High 
spontaneous ANN fire at a high rate even in the absence of input, 
whereas ANN from the other class require substantial input for fir- 
ing, with a tendency to phase-lock onto the signal involving a sub- 
stantial degree of jitter. On a finer level, this second class is often 
divided into a medium- and a low-spontaneous ANN that mainly 
differ in their distances to firing threshold and maximal spike 
rate 7,910 . Physiology is commonly held responsible for these differ- 
ences: relative to the second ANN class, the first ANN class forms 
synapses only on distinct IHC sides 11 and preferentially projects to 
distinct locations in the cochlear nucleus 12 . The two subclasses differ 
prominently in axon diameter. Every IHC connects to all ANN 
classes; a single ANN is contacted by only one of roughly 20 densely 
packed single synapses on an IHC tail 7 . On both sides of the synaptic 
cleft, the interaction is by voltage-dependent ion channels. 

In our approach, the transmission from IHC to ANN is concate- 
nated into a time-sampled ANN input I(t n ). This input is comple- 
mented by a strong contribution of noise, strongly correlated in time, 
to reflect the nature of the neurotransmitter release. As a result, we 
chose J(f„) to have the form 



I(t n )=A + B-2Q-(V IC (t n )-V IC ,rest)+^(t„), 



(2) 



where constant A has the effect of a firing threshold and where B 
scales IHC voltage to the evoked ANN current. C is exponentially 
correlated synaptic noise of intensity tr, independent for each trans- 
mission channel (we use the algorithm of Ref. 13, with a correlation 
time constant z a [ms] . Our paradigm would, however, work equally 
well with white noise, though at a synapse, this would be less plaus- 
ible). With this form of I(t n ), noise can trigger spontaneous ANN 
firing at low firing thresholds even in the absence of (other) input. 
The correlation time of the noise was determined by matching our 



approach with biological data 14 (cf. the INN synaptic noise correla- 
tion time section of the Methods). Following the conjecture 15 that 
the distinguished postsynaptic potentials (sub-threshold for low- 
spontaneous, and super-threshold for high-spontaneous ANN) 
are the consequence of the different biological wiring, we use A = 
0 for the high- and A = —0.2 for the low- and medium spontaneous 
classes, and ensure that low- and medium- spontaneous ANN need, 
in addition to the continuous part of 7(f), a noise contribution c(f) to 
cross the spiking threshold. For appropriate parameter values, the 
membrane potential x„ of Rulkov's spike- afterhyperpolarization 
neuron model 16 

f (v: =y" + ^y n +pri n ) 
v, x„ < 0, 



Xn+l — ' 



y„+i=y ht y n 



a 

1 x n 

a + v, 0<x„ <a + v; x„_i<0, 
— 1, x„ > a + v; orx„_!>0, 

g h P if n-ih iteration spiked, 
0 otherwise, 



(3) 



reproduces the characteristic biological spike trains of the different 
ANN classes indistinguishably from biology. In this model, y„ is a 
slow hyperpolarizing current, whereas constant y rs defines the rest- 
ing potential. I„ represents the external driving current. A spike is 
generated every time x„ attains its maximum value. Spike frequency 
and spike strength are controlled by the parameters y hp and g lp . 
Upon constant input current, the nonlinear function x n+1 = /(...) 
generates a (jittered) limit-cycle behavior. We use parameter values 
a = 3.8, y" = -2.9, b hp = 0.5, = 0.1 and b' = 0.1 16 and modify 
the original timescale by a factor of ten. This yields a sampling rate 
of 20 kHz that is maintained throughout the compound system, to 
account for very fast spiking ANN, and generates an almost linear I- 
/curve 16 . The typical responses of the three ANN classes (cf. Ref. 10 
and the ANN rate versus level curves section of the methods) are 
reproduced in Fig. 2 by stimulating the map with a single tone at/ c 
for varying input intensity at one of the three standard parameter 
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Figure 2 | ANN response classes. Upper panel: Black lines from the standard parameter values of Table I. Colored lines are from the bracketed values of 
parameter A (red), B (green), a (purple) and z a (orange). Spontaneous rates: B = 0 (blue). Lower panel: Corresponding ANN spike rates (f c = 1760 Hz). 
Cochlear information is relayed into ANN spike rates that take care of different dynamic ranges, but preserve the essentials of the cochlear signal (here on 
linear spike rate scale, in Fig. 3 last row on logarithmic scale). 



sets of Table I (black lines). The colored lines contained in the 
figures demonstrate that all biologically observed profiles can be 
generated by the model by sweeping the parameters across intervals 
around the standard values, without ever running into non-physio- 
logical responses. 

Results 

Compound model performance. Across the different stages of the 
compound system, the cochlear information is essentially preserved. 
In Fig. 3, the outputs of the Hopf cochlea (top panel), of the IHC 
(second and third panels) and of the ANN (lowest panel) are shown, 
for two frequency channels. For the experiments, a single tone with 
fixed amplitude was fed into the Hopf cochlea, sliding input 
frequencies from 0.2/ c to 1.5/ c . To cover an input range from —60 
dBj^up to — 10 dB lv , the experiment was repeated in steps of 10 dB. 
At the Hopf cochlea, the amplitude of the (single tone)-oscillation 
was measured; at the IHC, the amplitudes of both the AC- and the 
DC-components were measured. At the level of the ANN, the 
amplitude of the neuronal firing was measured in terms of spike 
rates. From these measurements it follows that all essential features 
of the mammalian cochlea are faithfully reproduced. The most 
prominent easily verifiable ones are the strong amplification of 
faint sounds, compressive nonlinearity of exponent one-third, left- 
shift of the response peaks upon an increase of the input amplitude 
and characteristic broadening of particularly the low frequencies for 



low input amplitudes 13,5 . IHC low-pass filtering (c.f. Fig. 3 f c = 
1760 Hz) is accompanied by strong input sound compression (c.f. 
f c = 440 Hz) 6 . Upon feeding the IHC signal into the ANN, spikes 
recover the original quality of the cochlear response (Fig. 3, last row, 
for high-spontaneous ANN). High-spontaneous neurons show a 
quicker saturation for loud sounds, low-spontaneous neurons only 

respond above an input intensity of 30 dB, thereby taking care of 

different dynamic ranges. On the linear spike rate scale, each class 
faithfully transmits the essentials of the Hopf cochlea output (Fig. 2, 
last panel vs. Fig. 3, first panel), but each on a dynamic range of its 
own. On the typical dynamic ranges transmitted, all three neuron 
classes fully retain the cochlear information (Fig. 2). Generated 
tuning curves (an often used alternative to characterize auditory 
response by iso-intensity tuning curves) for BM cochlea motion 
and for the different ANN are very similar (cf. the Cochlea and 
ANN tuning curves section of the methods). Moreover, they agree 
with the biological data 9,17 that serve as the guideline for a faithful 
transduction from cochlea to CNS 17 . 

Suprathreshold stochastic resonance. To what extent presence of 
noise plays a distinguished role in achieving this performance we 
shall exhibit by a pitch-shift experiment 18,19 . If an AM sound with 
fear = 850 Hz and f mo j = 200 Hz is fed into the cochlea (at an 
amplitude of —17 dB lv -), this corresponds to a pitch-shift experi- 
ment with f 0 = 200 Hz, k = 4 and Sf = 50 Hz, generating a 



Table I | Parameter values of the three ANN classes. Values in brackets correspond to the parameter variations in Fig. 2 exhibited by colored 
lines 

A 8 y hp a x„ 

Hi-spont 0(-0.02) 1(0.8) 0.97 0.1(0.2) 3(5) 

Me-spont -0.2(-0.25) 1.25(1.15) 0.5 0.06(0.04) 3(5) 

Lo-spont: -0.2(-0.25) 1.05(1.15) 0.5 0.04(0.06) 3(5) 
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Figure 3 | Output amplitudes (logarithmic scale) as a function of input frequency (linear scale). Lines represent constant cochlear input intensities, 
from —60 dB lv = 1 mV (lowest) to — 10 dB lv (uppermost line), in steps of 10 dB. The characteristic cochlear information is preserved across the 
different stages of transcription. 



. , .. , r r, / ~.i-.rTT i , -jr Boer s first pitch shift rule . In the absence of noise, high spon- 

perceived pitch f„ = f 0 + — = 212.5 Hz, equivalent to a period ol r . , , . . , . °^ ^ r ., 

r r Jy J k ^ r taneous neurons would quickly lock onto the signal, i.e., onto the 

4.7 [ms] 1J!J1 . Fig. 5 shows measurements taken at f c = 880 Hz, in modulation frequency (200 Hz, 5 [ms]). It is only upon the addition 

the regime where the perceived pitch (measured as the first most of noise, that a distribution with a main peak at the perceived pitchy, 

prominent peak of the ISI distribution), is known to follow de emerges (Fig. 4). Sets of low-medium spontaneous neurons (that 
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Figure 4 | Suprathreshold stochastic resonance of high/low-medium spontaneous ANN (upper/lower panel), (a) Spike trains of one/four neuron(s), 
(b) instantaneous spiking frequency distribution at the indicated noise level, (c) probability p for the instantaneous frequency to coincide with frequency 
of the perceived pitch f„ for variable noise levels er. 



cannot directly encode^, in their instantaneous frequencies), when 
driven by identical signals but independent noise, generate an almost 
regular spike pattern, with a clear instantaneous frequency peak at 
locus of the perceived pitch f p (the "volley principle" of auditory 
nerve coding). Fig. 4 demonstrates that this encoding of the 
perceived pitch in ANN spike rates bequests a nonzero amount of 
noise. More details of how this is achieved and how well the required 
noise coincides with the one observed in biological measurements 
can be found in the Cochlea and ANN tuning curves section of the 
methods. Clearly, our simple median-based measure p(a) neither 



takes account of more global properties of the distribution, nor of 
how the pitch is finally extracted from the ANN (which may be the 
origin of the minor mismatch between the optimal noise in Table I vs. 
the optimal noise in Fig. 4), but otherwise our observations are very 
stable and consistent. 

As a final test, the biologically faithful transduction of the cochlear 
signal to the CNS is exhibited in the reproduction of psychoacoustic 
pitch phenomena. First, three-frequency stimulation of our auditory 
system is shown to give rise to pitch sensation that indeed follows de 
Boer's first pitch shift rule, for all detunings <5/(Fig. 5a). Second, and 
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Figure 5 | Perceived pitch from the peak in the ISI-histogram of high-spontaneous ANN (full dots), (a) AM-sounds at fixed f car = 800 + &fij mo i = 
200 Hz, noise level a = 0.07, dotted line fi p = 200 + df/4 Hz, the first pitch shift effect). Results scale correctly with bf. (b) Two-frequency stimulation 
fi and f 2 = fi + 200 Hz (input amplitude Ao = — 25 dB lv , scaling factor B = 3.0, cochlea output at f = 622 Hz). The first pitch-shift law f p = (f + / 2 )/(2fc 
+ 1) at k = 4 (dotted line) is coherently violated (second pitch shift effect), by psychoacoustic data (crosses 22 ) as well as by our measured data (full dots). 
Inset: ISI-histogram for fi = 900 Hz showing the perceived pitches for k = 4 (left peak, for the rightmost cross) and for k = 5 (right peak, cross 
at fp ~ 178 Hz, out of display). 
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Figure 6 | Bimodal spontaneous firing histogram from Ref. 8, leading to the distinction of the three classes of ANN: (a) low-, (b) medium-, 

(c) high-spontaneous ANN. The emergent main classification comprises the high-spontaneous and the concatenated medium/low-spontaneous class. 

even more importantly, Smoorenburg's two-frequency stimulations Discussion 

psychoacoustic data 22 are perfectly recovered by performing the cor- From the peripheral hearing system, ever more details are known of 

responding pitch-shift experiment in the artificial system (Fig. 5 b), the parts involved. How these parts functionally work together, how- 

de Boer's second pitch shift effect). ever, has remained a challenge. Our full model of the peripheral 
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rate = 76 spikes/s). Lines indicate identical slopes. 
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Figure 8 | Biological 'rate vs. level' data by Ref. 10 (Fig. 2 top row). 

hearing system is based on the principles of nonlinear physics and 
includes in a detailed manner the facts known from biology. Our 
model is in a sense minimal: the design of the cochlea, the inner hair 
cells, and the auditory nerve neurons, are extremely simple. Yet, our 
model not only reproduces all salient biological measurements to 
great accuracy, it also emphasizes the important role of synaptic 
noise in the transmission of salient hearing features, from the con- 
tinuous basilar membrane motion to the discrete spiking world of the 
CNS. We demonstrated on the basis of physics that all nonlinear 
features measured at the auditory nerve can indeed be traced back 
to the active amplification process within the cochlea, a conclusion 
made previously on the basis of physiology 23 . A novel observation is 
that suprathreshold stochastic resonance seems to be necessary to 
enable the correct transition from IHC to ANN. 

Our work could be seen as a next step following the modeling of 
Ref. 24, where Hopf elements at their bifurcation points with no 
biological interaction among them and with hair cells reduced to 
abstract threshold oscillators, reproduced the first pitch shift effect. 
From our more detailed modeling, we observe here the natural emer- 
gence of the second pitch shift effect across the peripheral auditory 
pathway from the cochlea to the auditory nuclei. The straightforward 
reproduction of the cochlea-generated psychoacoustic second 



pitch-shift by our peripheral hearing model corroborates that pitch 
sensation has its origins in the cochlear nonlinearities, and that the 
peripheral auditory system takes surprising care to pass on the 
cochlear nonlinearities to the CNS. We also provide an important 
example of, and argument for, the omnipresence of noise in the 
nervous system. Audition is a particularly intriguing place for such 
an observation, as the mammalian hearing system is famous for its 
high temporal precision and reliability. In this sense, our approach 
opens the perspective upon a novel construction paradigm for high- 
precision information processing based on noisy elements, that 
circumvents bottlenecks encountered by current technology, particu- 
larly in chip design. Beyond this and offering new insights in hearing 
research, our model can serve as a template for faithfully transducing 
continuous into discrete-time systems, exceeding conventional high 
frequency sampling methods in efficacy and robustness. Due to its 
simple biological blueprint, it was simple to also realize the model in 
hardware, which yielded virtually coincident results. 

Work supported by the Swiss National Science Foundation 
(Grants 200020-132881, 200021-122276 to R.S). 

Methods 

Peripheral auditory system implementation details. Here, we provide more details 
on the implementation of the inner hair cells (IHC), on our auditory nerve neuron 
(ANN) model, and on the nature of the stochastic resonance exhibited in the main 
manuscript. 

A. Inner hair cell (IHC). We relayed cochlear membrane states V Co (t) linearly to IHC 
cilia displacements, using u(t) = 20 • 10~ 9 • V Co (t). Cilia displacements cause the IHC 
voltage V lc to change according to the IHC model developed originally by Eustaquio- 
Martin and Lopez-Poveda 6 : 

at C A + C B 

{E'kf - Vic) '&/( Vic) + (E'h ~ Vic) '&( Vc)] , 

where g m (u(t)) = G„((l + exp((«„ - u(t))/s„) (1 + exp(( Ul - u(t))ls 1 )))-\g afis (V I c) 
= G„,((l + exp((7 1#s - V IC )IS lfis ) (1 + exp((V 2//s - V IC )/S 2 , fh ))r l andwhere#is 
the constant leak conductance. Unchanged in vivo parameters of the model were 
used. 




Figure 9 | (a) Iso-intensity tuning curves from the cochlea (black lines, where the input amplitudes lead to a cochlear oscillation of — 10 dBi v ) and from 
the compound peripheral auditory system model (red lines, where numbers 1-3 indicate the ANN class from which the measurements were taken). Input 
amplitudes led to (1) 10 spikes/s (low-spontaneous class), (2) 10 spikes/s (medium-spontaneous class), (3) 10 spikes/s additional to the spontaneous 
spike rate (high-spontaneous class), (b) Biological data 10 , from high-sponaneous ANN. Measurements at the cochlea have been reported to yield 
coincident response profiles 17 . 
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Figure 10 | Perceived pitch f p from the ISI-distribution peak, at different noise levels g = 0, 0.03, 0.08. (a) High-spontaneous ANN completely lock to 
the modulation frequency f moc j — 200 Hz (ISI peak around 5 ms) in the absence of noise. Upon increasing the noise towards the biological level, the 
perceived pitch f p is implemented and, for higher than biological noise, f p is lost again, (b) Medium-spontaneous ANN: Here, f p is implemented soon after 
statistical firing is enabled (around a — 0.02). Upon higher noise levels, f p is gradually lost. 



B. Classes of auditory nerves (ANN). The spontaneous spike rates of auditory nerve 
neurons exhibit a distinct bimodal distribution 8 , see Fig. 6. Taking into account 
coherent morphological, physiological and functional viewpoints, ANN are 
conventionally divided into a low, a medium, and a high spontaneous spike class. Our 
modeling closely follows the distinction between these classes. 

ANN synaptic noise correlation time. Due to the biological mechanism of the synaptic 
transmission, synaptic noise will be correlated in time. A finite correlation time is 
particularly evident for high-spontaneous ANN, that spike intensively even in the 
absence of input. To find the biologically justified correlation, we compared model- 
generated ISI distributions from high- spontaneous ANN to corresponding animal 
data 14 , see Fig. 7. An exact match of the exponential decay is found for a correlation 
time t — 3 ms. This value represents a typical synaptic time scale (see e.g. Ref. 25). 
Smaller/larger chosen correlations lead to a faster/slower decay of the Poisson-like 
distributions. 

ANN rate versus level curves. Our modeling results (Fig. 2) almost indistinguishably 
reproduce the biological 'rate vs. level' data (Fig 8 of Ref. 10). 

C. Cochlea and ANN tuning curves. Auditory response is often characterized by 'iso- 
intensity tuning curves', for both cochlear oscillation amplitudes and auditory nerve 



spike rates, which were found to having a very similar characteristic form 9 ' 17 . Also in 
our setting, the emergent tuning curves of the cochlear oscillations and those of 
auditory nerve neuron spiking display the same qualitative shape that, moreover, 
coincides with the measured biological data (Fig. 9). 

D. ANN stochastic resonance details. According to our modeling that is based on 
carefully chosen parameter values, intriguingly, for correct pitch transduction a 
nonzero amount of noise is required. The working mechanism is illustrated by ISI 
histograms from high- and medium- spontaneous ANN (Fig. 10). 

For vanishing noise level a 0, the limit-cycle high-spontaneous ANN lock to the 
modulation frequency (ISI close to 5 msec), and thereby cannot transduce the per- 
ceived pitch (l/fp = 4.7 msec). Turning the noise on leads to a broadened and shifted 
distribution, until, from a = 0.03, the peak of the distribution is in the vicinity of 
4.7 ms. Increasing the noise level further flattens the ISI-distribution (i.e. decreases 
the signal-to-noise ratio) and eventually shifts the distribution peak beyond the 
perceived pitch. Note that changing the coupling strength alone (B — 1 in Eq.(2)) 
would fail to yield the correct pitch (e.g., B = 0.5 yields peaks from the interval (6, 8.5) 
ms). At 'biological' parameters, single medium-spontaneous ANN are unable to 
reproduce the perceived pitch. Therefore, a 'volley' setting comprising n — 4 neurons 
is considered. Quite soon after noise enables statistical spiking, the correct pitch 
frequency is transduced, until at too high noise levels, is gradually lost. Low- 
spontaneous ANN behave qualitatively identical to medium-spontaneous ANN. 
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